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ABSTRACT 


astro-ph.HE] 28 Jun 2023 


u We present the results of the search for an isotropic stochastic gravitational wave background (GWB) at nanohertz frequencies using the second 

data release of the European Pulsar Timing Array (EPTA) for 25 millisecond pulsars and a combination with the first data release of the Indian 

= Pulsar Timing Array (InPTA). A robust GWB detection is conditioned upon resolving the Hellings-Downs angular pattern in the pairwise cross- 

> correlation of the pulsar timing residuals. Additionally, the GWB is expected to yield the same (common) spectrum of temporal correlations across 

vp pulsars, which is used as a null hypothesis in the GWB search. Such a common-spectrum process has already been observed in pulsar timing data. 

—^4 We analysed (i) the full 24.7-year EPTA data set, (ii) its 10.3-year subset based on modern observing systems, (iii) the combination of the full 

(N data set with the first data release of the InPTA for ten commonly timed millisecond pulsars, and (iv) the combination of the 10.3-year subset 

\© with the InPTA data. These combinations allowed us to probe the contributions of instrumental noise and interstellar propagation effects. With the 

= full data set, we find marginal evidence for a GWB, with a Bayes factor of four and a false alarm probability of 4%. With the 10.3-year subset, 

* we report evidence for a GWB, with a Bayes factor of 60 and a false alarm probability of about 0.1% (= 3o significance). The addition of the 

InPTA data yields results that are broadly consistent with the EPTA-only data sets, with the benefit of better noise modelling. Analyses were 

© performed with different data processing pipelines to test the consistency of the results from independent software packages. The latest EPTA data 

from new generation observing systems show non-negligible evidence for the GWB. At the same time, the inferred spectrum is rather uncertain 

' and in mild tension with the common signal measured in the full data set. However, if the spectral index is fixed at 13/3, the two data sets give a 

similar amplitude of (2.5 +0.7) x 1075 at a reference frequency of 1 yr! . Further investigation of these issues is required for reliable astrophysical 

e = interpretations of this signal. By continuing our detection efforts as part of the International Pulsar Timing Array (IPTA), we expect to be able to 
improve the measurement of spatial correlations and better characterise this signal in the coming years. 


e 
= Key words. gravitational waves — methods:data analysis — pulsars:general 


1. Introduction cies, where stellar mass compact binary mergers leave their im- 
. - . print, a variety of astrophysical and cosmological phenomena 
The first direct gravitational wave (GW) detection (Abbott et al.| are expected to generate GWs over a much broader frequency 


2016) marked the beginning of a new era in the exploration Of ^ spectrum, reaching down to the nanohertz regime and beyond. 
the Universe. Although terrestrial interferometers such as LIGO, 


Virgo, and KAGRA are sensitive to GWs at kilohertz frequen- Supermassive black holes (SMBHs) are ubiquitous in galax- 
ies and there 

* sychen@pku.edu.cn is growing evidence that some of them formed when the Uni- 
** yjguo@mpifr-bonn.mpg.de verse was less than a gigayear old (e.g.{Wang et al.[2019] 2021). 
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According to the established cold dark matter cosmological sce- 
narios, galaxy formation proceeds in a hierarchical fashion, with 
small galaxies merging with each other over cosmic history to 
build progressively larger structures (White & Rees||1978). If 
these galaxies host SMBHs in their centres, in the aftermath 
of the merger, SMBH binaries (SMBHBs) will inevitably form 
(Begelman et al.[1980). Adiabatically inspiralling SMBHBs are 
anticipated to be the loudest sources of GWs at nanohertz fre- 
quencies. The incoherent superposition of their emitted GW sig- 
nals forms a stochastic GW background (GWB) whose ampli- 
tude and spectral index relate to the galactic merger history of the 
Universe and to the dynamical properties of the emitting binaries 
(Jaffe & Backer|7003) [Sesana et al 008] Sesanai2013). Besides 
SMBHBs, a stochastic GWB can be produced by a number of 
other physical processes potentially occurring in the early Uni- 
verse, including non-standard inflationary fields (Guzzetti et al. 
(2016), first-order phase transitions 
mological defects such as a network of cosmic strings 
[& Vilenkin[2000). A comprehensive overview of these phenom- 
ena can be found in Caprini & Figueroa|(2018). 

Currently, this very low-frequency GW regime can only be 
accessed with pulsar timing arrays (PTAs, 
[1990). The technique of pulsar timing relies on the exceptional 
rotational stability of a particular population of neutron stars, 
the millisecond pulsars (MSPs). The times of arrival (TOAs) of 
radio pulses observed at the telescope are measured precisely 
using maser clocks referenced to the international atomic time. 
A model, known as a phase-connected timing solution, is then 
used to account for every rotation of the pulsar for the entire se- 
ries of TOAs (see[Lorimer & Kramer|2004] for a detailed expla- 
nation). The pulsar timing technique has allowed high-precision 
measurements that have led to several significant breakthroughs, 
including the first indirect detection of GWs through the mea- 
sured orbital shrinkage of PSR B1913+16 
[1989). In a PTA a network of the most stable MSPs is observed 
regularly and the TOAs are modelled. It is within the small de- 
viations from the model (the residuals) that nanohertz GWs can 
be searched for. 

The idea of using MSPs to detect nanohertz GWs was first 
proposed by and (1979). The distor- 
tions in spacetime caused by a GW propagating over the Earth 
or over a pulsar lead to stochastic advances or delays in the 
TOAs. Astrophysical sources produce GWs that cause larger 
delays over longer timescales, that is, a temporally correlated 
(red) signal. However, disentangling the GW signal from other 
red noise sources, such as variations in the interstellar medium 
(ISM) or intrinsic pulsar spin noise, with a single pulsar is im- 
possible. Foster & Backer|(1990) were the first to suggest a PTA 
as a method to overcome this problem. Not only would the GW 
signal result in a common red signal (CRS) in all pulsars, but 
the signal would be spatially correlated across the sky. This cor- 
relation is related to the quadrupolar nature of GWs. Although 
GWs passing over the individual pulsars would not be correlated, 
those propagating over the Earth would be. When the degree of 
correlation for each pair of pulsars is plotted against their angu- 
lar separation, this results in the Hellings and Downs curve (HD, 
[Hellings & Downs|1983). It is this HD curve that allows us to 
distinguish the GWB from other potential sources of correlated 
signal (e.g. the modelling of Earth's motion in the Solar system 
and local clock instabilities, /Tiburzi et al.[2016). 


The European Pulsar Timing Array (EPTA, 
2013) was formed in 2004 to facilitate the detection of 


GWs. However, it uses pulsar observations taken well before its 
formal creation, some of which were specifically for PTA-style 
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analysis. The EPTA data are provided by some of the largest 
radio telescopes in Europe: the Lovell telescope at the Jodrell 
Bank Observatory, the Nancay decimetric radio telescope, the 
Westerbork synthesis radio telescope, the Effelsberg 100 m ra- 
dio telescope, and the Sardinia radio telescope. These telescopes 
supply independent data sets at a range of observing frequencies 
(see the EPTA Collaboration|2023), but since 2009 they have 
also worked together as the Large European Array for Pulsars 
(LEAP), a coherently phased interferometer with an equivalent 


dish diameter of up to 194 m (Bassa et al.|2016). 

The earliest EPTA data date back to 1994, with most pulsars 
having over 15 years of data. The bandwidth available and the 
backends used to record the data have improved over the years. 
While only some coherently dedispersing backend systems were 
used initially, considerable upgrades were made around 2005- 
2010, when most telescopes switched to broadband, coherent 
dedispersion systems. In addition to offering a wide range of ob- 
serving frequencies and high observing cadence (with weekly or 
even shorter spacings between successive observations), multi- 
ple telescopes also allow the data sets to be checked against each 
other, highlighting any local issues. This is crucial for reliability. 


THE EPTA is a founding member of the International Pul- 


sar Timing Array (IPTA, 2009), along with the 
Parkes Pulsar Timing Array (PPTA) which uses the Parkes tele- 


scope (Manchester|2006| Hobbs|2013), and the North American 
Nanohertz Observatory for Gravitational Waves (NANOGrav), 
which uses data from the Arecibo observatory, Green Bank radio 


telescope, and Very Large Array (Jenet et al.|2009 
2015). Recently, the Indian Pulsar Timing Array (InPTA, 


using the Giant Meterwave Radio Telescope, 2018) 
has joined the IPTA as a full member, while the Chinese Pulsar 
Timing Array (CPTA, using the Five Hundred Meter Spherical 


Telescope, Jiang et al.|2019), and the MeerTIME programme 
(using the MeerKAT telescope, [Bailes et al.[2020) have become 
observer members. 

The first EPTA data release (DR1) was made in 2015 
gnes et al.|2016) and was used to place an upper limit on the 
GWB (Lentati et al.[2015). However, during the analysis of the 
six best pulsars in the array, a weak CRS was observed in the 
data. An analysis of the same pulsars in 2021 (Chen et al./2021) 
allowed for a direct comparison with the earlier work and clearly 
showed that not only was the CRS still present in the data, but 
also that its properties could be significantly better constrained. 
This CRS is consistent with the findings of the NANOGrav 12.5- 


year data set (Arzoumanian et al 2020), PPTA DR2 (Goncharov 
2022a), and IPTA DR2 (Antoniadis et al.[2022). However, 


six pulsars do not provide enough pairs to sufficiently sample 
the HD curve, the crucial signature of GWs. To this end, EPTA 
Data Release 2 (DR2, has been 
created using 25 MSPs optimally selected among those timed by 
the collaboration, following the method described in [Speri et al.] 
(2023), 

In this paper, we present the results of the search for a 
stochastic GWB at nanohertz frequencies in the EPTA DR2. A 
summary of the data set and noise models is given in Section [2] 
For more details, we refer interested readers to the companion 
papers (the EPTA Collaboration[2023| 
2023), respectively. In Section |3| we briefly review 
our analysis methods, which are similar to those used in the six- 
pulsar analysis presented in|Chen et al.|(2021). Our main results, 
including a comparison of the full DR2 data set against a reduced 
data set that includes only the new generation backend systems, 
are presented in Section [4] In Section [5]we discuss the addition 


EPTA+InPTA: GWB Search 


of InPTA data and its impact on the GWB search, and draw our 
conclusions in Section [6] 


2. Data set and noise models 


The DR2 includes observations of 25 pulsars selected from the 
DR1 source list. These data were collected with six EPTA tele- 
scopes, including LEAP. The DR2 data set is a combination of 
data from DR1 with those recorded with a new generation of data 
acquisition systems which offer significantly wider bandwidth 
and thus greater sensitivity. The DR2 data set offers a variety of 
time spans for different pulsars, from a minimum of 14 to a max- 
imum of 25 years. That data set also has a broad observing fre- 
quency coverage, starting from (~300 MHz) and extending up to 
(^4 GHz). A subset of DR2, using data for six pulsars was used 
for the common red noise process search presented in|Chen et al.] 
(2021). Since then, our multi-telescope data have allowed us to 
detect and correct an issue in the clock corrections applied to the 
data collected with the ‘NUPPI’ pulsar backend 
at the Nancay Radio Telescope. More details of the EPTA 
DR2 data set and timing analysis results can be found in our data 
release paper (he EPTA Collaborationp073) 

For the analysis presented in this paper, we used two versions 
of the DR2, the full data set and a truncated version that features 
data collected with the new generation of pulsar backends only. 
These are extended by incorporating data from the first InPTA 
data release for an overlapping set of ten 
pulsars. The InPTA data set was obtained using the upgraded 
Giant Metrewave Radio Telescope (uGMRT) from MJD 58235 
to 59496 covering about 3.5 years. It complements the EPTA 
data with simultaneous observations in the 300-500 MHz and 
1260-1460 MHz bands and adds about 0.7 years to the EPTA 
time span. To summarise, we analyse the following four data 
sets, additional details of which can be found in the EPTA data 


release paper (the EPTA Collaboration|2023) and the accompa- 
nying pulsar noise analysis paper (the EPTA and InPTA Collab- 
2023): 


1. DR2full. The complete EPTA DR2 data set, covering 24.7 
years of data; 

2. DR2new. A reduced version of the entire data set, including 
only the last 10.3 years of data, collected with new genera- 
tion wide-band backends; 

3. DR2full+. The same as DR2full, but with the addition of 
InPTA data for ten pulsars, covering 25.4 years of data; 

4. DR2new+. The same as DR2new, but with the addition of 
InPTA data for ten pulsars, covering 11.0 years of data. 


The new generation backends use improved hardware for the 
conversion of the electric signals to digital data streams and al- 
low for coherent dedispersion during the observations, whereas 
previous systems mostly operated with incoherent dedispersion. 
The increased processing power also enables us to use up to four 
times the frequency bandwidth as compared to the older, legacy 
backends. 

Before a correlated signal can be searched for, the determin- 
istic properties of individual pulsars need to be modelled. This 
includes the spin, astrometric, orbital (for binaries), and noise 
parameters of the pulsar (the EPTA Collaboration|2023). Single 
pulsar noise models for the data sets mentioned above have been 
obtained from a specific model selection scheme presented in 
[the EPTA and InPTA Collaborations) (2023). For all pulsars, the 
timing model parameters were analytically marginalised and the 
white noise parameters EFAC and EQUAD were set at fixed val- 
ues; cf. Section B] The TOAs are measured by averaging over 


a frequency range in which the pulse profile can be consid- 
ered stable. For the legacy systems, the full bandwidth of about 
128 MHz was used. However, for the new generation backends 
with larger bandwidths, we split the observation into sub-bands, 
treating each sub-band independently with a template and offset. 
This method allowed us to reduce the number of TOAs while 
retaining most of the information from the observations. With at 
most four TOAs per observation and due to the significantly low- 
ered sensitivity as a result of the sub-banding we could not mea- 
sure significant, time-correlated white noise. Thus the ECORR 
parameter, which describes the presence of such noise, was not 
included in the analysis. Model selection was applied for other 
time-correlated signals, allowing for the selection of the most 
favoured combination among observing-frequency independent 
red noise (RN), dispersion measure variations (DM), and scat- 
tering variations (SV). These correspond to stochastic signals 
that induce a delay in the timing residuals with a chromatic in- 
dex k of zero, two, and four, respectively, which characterises 
the dependence on the observing radio frequency v^*. Two large 
events were observed in PSR J1713+0747, one at MJD ~ 54757 


(Coles et al.[2015 ) and 
one at MJD ~ 57510 (Lam et al.12018| |Goncharov et al./202 1a} 
Chalumeau et al. . These were assumed to be caused by 


sudden changes in the scattering and dispersion variation and 
modelled as deterministic signals with fixed chromatic indices 
k = 4 and k = 2, respectively, as obtained from a Bayesian fit 
to the data. While both events are spanned by DR2full, only the 
second event falls within the time-span of DR2new. 

Each noise process is modelled as a sum over Fourier compo- 
nents. Following|Chalumeau et al.|(2022), we did not fix a priori 
the number of Fourier components of the various processes in the 
noise analysis. Instead, for all combinations of RN, DM, and SV, 
we determined the optimal number of Fourier components that 
best describes the data. We did not consider models that include 
SV but not DM, as this is not physically motivated. We obtained 
customised noise models for each pulsar from a Bayes factor 
(BF) evaluation among the candidate models and performed a 
final analysis by refitting the timing model parameters simulta- 
neously with these favoured noise models. This enabled further 
refinement of the timing model parameters and a more reliable 
evaluation of the white noise parameters, which are subsequently 
fixed in the GWB analyses. The interpretation of custom noise 
models and further discussion on the robustness of these results 


are presented in|the EPTA and InPTA Collaborations|(2023). 


3. Methods 
The analysis methods closely follow those of (2021) 


and references therein. In the following, we summarise the key 
components of the analysis and provide details of additional 
analyses included in this work. 

The PTA likelihood function for a CRS search is given by 


(van Haasteren et al.[2009) 


63 Xu 6 (9t) 
Ii 
The post-fit residuals of pulsar J at observation i are denoted as 
ot;; and C=C ar Ccns is the sum of the block diagonal covari- 
ance matrices for all pulsars!) and the overlap reduction function 
(ORF) I multiplied by the covariance matrix for the correlated 


(t 


Lpta « 


! To avoid Č becoming singular, it is regularised as [o = |C*| x 
|M‘7C*"'M"|, where M' is the design matrix of all pulsars. 
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common red signal Ccrs. The timing model parameters are an- 
alytically marginalised OU e mee dem CU 
(2014) for more details. 

The covariance matrix for each pulsar contains information 
on the white and red noise components RN, DM, and SV. Mea- 
surement uncertainties on the TOAs can be calibrated with a pair 
of white noise parameters, EFAC and EQUAD, for each tele- 
scope, receiver and backend combination to modify the initial 
estimate from the instrument and TOA extraction method, 


07, = (ij X EFAC)” + EQUAD? (2) 


The red noise power spectra were modelled with a power law 


(3) 


S RN 


A? f =A yr 
~ 1272 (4) 
representing long-term variations in the ToAs which are inde- 
pendent of the radio frequency of the observations. This model 
was used for both, individual pulsars as well as for any putative 
common red noise. 

Propagation of the radio signals through the interstellar 
medium adds delays that depend on the frequency of the radio 
photons. Following|Chalumeau et al.|(2022) we considered two 
types of processes; DM variations and scattering of the photons 
by electrons encountered along the line of sight between the pul- 
sar and Earth. These were also modelled with power laws 


A? AL yr? 


Ss , 
EO 2m vk \ lyr] T 


(4) 


where K is the DM constant at a reference frequency of 1 MHz, 
k is the chromatic index of two or four for DM and SV, respec- 
tively, and v is the radio frequency of the propagating photons. 

The frequencies f of the Fourier basis were chosen to be 
fn =n/T (n = l, ..., N), where T is the time interval between the 
first and last observations, and N is the number of frequency bins 
considered. This number was customised for each noise process 
in each individual pulsar, as described in the companion noise 
analysis paper (the EPTA and InPTA Collaborations 023), 

For the common red signals, we used two methods to deter- 
mine the optimal number of frequency bins: 1) we fitted a broken 
power law to estimate the frequency where the red noise became 
dominant over the white noise (Arzoumanian et al.[2020); and 2) 
we constructed a free-spectrum model, where the power at each 
frequency was modelled with an independent parameter 
[Arzoumanian et al 2020) 

For the detection of the GWB, the characteristic spatial cor- 
relation described by the HD curve 


3 xy 1 1l 
Vows (Gy) = Say nx — “2 + = + = Oxy, 


2 4 2 2 ©) 


is the key criterion. Here, ¢7; is the spatial angular separation 
between pulsars J and J, xz; = [1 — cos(Z7;)]/2, and ó(x;;) is the 
Kronecker delta function. 

We employed three types of model to search for generic spa- 
tial correlations in the data to compare against the expected HD 
correlation from a GWB: 


1. a binned correlation function (Taylor & Gair|2013), where 


we weighted the evidence for the pulsar pair correlations in 
ten bins of angular separations between pulsars; 


Article number, page 4 of 23 


2. a Chebyshev polynomial decomposition to the third order 
(Lentati et al.|2015| |Chen et al.|2021) 


(6) 


where yr; = (£r; ^ 1/2)/ (1/2) and c; are the Chebyshev poly- 
nomial parameters whose priors are uniform in the range 
[-1, 1]. The cross-correlation is normalised so that I(x) € 
[-1, 1]. This decomposition can be used to compare against 
constraints from previous EPTA data sets. 

3. a Legendre polynomial decomposition to fifth order 


et al.|2014} 


Try) & c1 + cyr + e3Qyr; — 1) + ea(4y;; - 3yr), 


5 
rén) = 9 liPcos £j), (7) 


i=0 


where P; are the Legendre polynomial functions of order i 
and /; are the Legendre polynomial parameters whose pri- 
ors are uniform in the range [—1, 1]. The parameters can be 
interpreted as the amount of power in the monopole i = 0, 
dipole i = 1, quadrupole i = 2, and higher modes. A pure 
GWB would have no monopole or dipole contributions (i.e. 
lọ = h = 0) in this decomposition with all power at i > 2. 


The off-diagonal elements of the covariance matrix encode 
the information on cross-pulsar correlated common signals. 
Apart from the quadrupole HD, we tested for the presence of 
a monopole (associated with, e.g. clock time errors) and a dipole 
(associated with, e.g. systematics in the model of the position of 
the Earth, the Solar system ephemeris, SSE) term. Unless other- 
wise stated, all analyses were performed with a fixed SSE model, 
DE440, produced by (2021). To check for possible 
SSE systematics, we performed additional analyses using the 
BAYESEPHEM model 
2020). 

Using only the diagonal terms of the covariance matrix al- 
lows for fast computational analysis and corresponds to a model 
without any spatial correlations, which we refer to as common 
uncorrelated red noise (CURN). It is also possible to use only the 
cross-terms to search for a common signal in a split-likelihood 
analysis [Antoniadis et aL[2022). 1f 
the posterior distribution of the uncorrelated model has a sub- 
stantial number of samples that are within the support of the 
correlated model, it is possible to employ the reweighting for- 
malism, which was introduced in|Hourihane et al.|(2022), to ap- 
proximate the posterior of the correlated model. The reweighting 
process involves sampling from the posterior distribution of the 
uncorrelated model (CURN) and then adjusting the weights of 
the obtained samples to reflect their likelihood under the corre- 
lated model. This technique enables the posterior of a correlated 
model to be obtained efficiently, the Bayes factor between the 
two models to be obtained, and the quality of the reweighted 
samples to be quantified. 


3.1. Bayesian analysis 


We estimated the parameters by evaluating the posterior prob- 
ability, which is proportional to the likelihood given by Equa- 
tion[I] multiplied by the prior. The inverse of the proportionality 
coefficient is referred to as Bayesian evidence and it is equal to 
the integral of the likelihood times the prior over the prior range. 
When searching for a GWB, we fixed the white noise parame- 
ters of each pulsar to the maximum likelihood values produced 


EPTA+InPTA: 


Table 1: Prior ranges for the parameters of the power laws used in 
the analysis: amplitude, A, and spectral index, y. Subscripts RN, 
DM, and CRS denote the red noise, DM noise, and common red 
signal, respectively. 


Parameter Prior Type Range 
Arn, Apm, Acrs || log-Uniform | [107!5 — 107!°] 
YRN> YDM; YCRS Uniform [0-7] 


by the single pulsar noise analysis (the EPTA and InPTA Col- 


laborations|2023) and we simultaneously evaluated the RN, DM, 
and SV of each individual pulsar and the CRS. The prior ranges 


adopted for these parameters is given in Table [1] Bayesian evi- 
dence was used for model selection, where the Bayes factor for 
one hypothesis over the other is equal to the ratio of the two ev- 
idence values corresponding to these hypotheses. The posterior 
evaluation was done mainly with m with other 
es srl. E TS 2 checking: m3c2 See 2023), Eryn 
, and (Williams etal 2021). 

eee this P we er med model seat, in two ways. 
First, directly, by introducing a hyperparameter that switches be- 
tween likelihoods corresponding to the two models. The ratio of 
the fraction of samples using one model to the fraction using the 
other model is the Bayes factor. This method i a as the 
Ao space sampling method (Carlin & Chib[1995] Hee et al.] 
. The second method involves ADT om the CURN 
odo E is mentioned above and described in detail in [Houri-] 


1.2022). 


3.2. Frequentist analysis 


We also used the frequentist optimal statistic (OS) framework, 


developed by (2008), [Demorest etal (2013) and 
‘Chamberlin et al.| (2015) with the noise marginalisation intro- 
duced by (2018), to compare against the Bayesian 
results. The Bayesian output of a CURN analysis was used as 


the input for the OS. This allowed for high computational effi- 
ciency, as the posterior distributions of pulsar noise were directly 
used to compute the statistics. With the OS we can compute the 
amount of correlated power for each pulsar pair. Comparing this 
correlation against different models gives signal-to-noise (S/N) 
estimates for different types of spatial correlations. 


3.3. Software packages 


As in E ), we used both ENTERPRISE?| oo et al. 
[2020] T: Taylor et al.|2021) and FORTYTW| P (Caballero et al.|2016) 


for the PTA Ae computation and cross-check " main 
results with both pipelines. Some of the more specific analyses 
were performed only with ENTERPRISE for computational cost 
efficacy since we have demonstrated in [Chen et al.| (2021) that 
the two pipelines produce broadly consistent results. 


4. Search results on the EPTA data sets DR2full 
and DR2new 


We first present results for the analysis carried out with EPTA- 
only versions of the data set, namely DR2full and DR2new. Re- 


https: fie in2p3.fr/epta/enterprise and |https:// Ihttps://| 


ttps: Ent ub.com/caballero-astro/fortytwo 


GWB Search 


sults for the EPTA+InPTA data set analysis are presented in Sec- 
tion 5] 

For simplicity and efficiency, our general analysis setup uses 
the DE440 Solar system ephemeris fit. The starting values for 
the marginalisation of the timing model are taken from the tim- 
ing analysis (the EPTA Collaboration[2023). Pulsar noise mod- 
els and observing system white noise parameters are taken from 
the single abet noise analysis 

Pes A analyses (e.g. [Arzoumanian et al. [2020) [Anto-] 
[niadis et al.|2022) have shown the importance of choosing the 


optimal number of frequencies to model any putative common 
signal and that most of the power of a common red signal can 
be found in the lowest frequency bins. We choose the width of 
the frequency bin to be 1/T, where T is the time span of the data 
set. For the power law the lowest 24 (9) frequency bins are used 
to model the CRS in the DR2 full (DR2new) data set, which cor- 
responds to a maximum frequency fa = 24/24.7 yr !(9/10.3 
yr?) respectively. We subsequently used these limiting frequen- 
cies for the remaining analyses (unless otherwise specified). This 
choice has been verified with a broken power law analysis, which 
shows that using a larger number of frequency bins does not im- 
pact the recovery of the parameters of the CRS. 

We show the posterior distributions of the free-spectrum 
model for the first 50 (20) frequency bins for the DR2full 
(DR2new) data set in Figure[I] The most noticeable difference is 
in the lowest constrained frequency bin. Extending the DR2new 
only best fit power law to lower frequencies would result in a 
lower CRS in the 1/24.7 yr^! bin, compared to what is measured 
in DR2full. Analogously, fitting a power law to the DR2full 
data set excluding the lowest frequency bin could give con- 
straints that are more consistent with the new data set. This could 
be due to the non-stationarity of the processes involved, devia- 
tions from the power law model or some additional unmodelled 
noise in the full data set. Further investigations are needed to 
understand better this difference and if and how it could be miti- 
gated. 


4.1. Spectral parameter constraints 


The right panel of Figure [I] shows the posteriors of the recov- 
ered parameters for a power law model for a HD correlated 
process. The recovered power law parameters with DR2new are 
logarithmic amplitude log;y A = —13. 94*02 a and spectral index 


y= 2. (ila 5 (90% credible regions). The poda index is shal- 
lower than the expected mean value of 13/3 from a population 
of circular SMBHBs, which still lies within the 3c credible re- 


gion (also see Middleton et al.[2021). With DR2 l the recov- 


ered logarithmic amplitude ee = -14. 26 4; and spectral 


index y = 4. 19575 is closer to 13/3. The two data sets give 
consistent results in the sense that the two posteriors overlap and 
lie on the same A — y degeneracy line that corresponds to fix- 
ing the total HD-correlated power. Therefore, the HD-correlated 
power measured in DR2full and DR2new is comparable, al- 
though the spectral shape is not well constrained and appears 
to be different in the two data sets. In support of this statement, 
when fixing the spectral index to 13/3, the background ampli- 
tude inferred from the two data sets 1s consistent, with a value of 


logio A = -14. 6140 i 


Table 2| p] gives the 90% credible regions of the recovered 
power law parameters for different analyses and models. Once 
the data set is fixed, the CRS (CURN or GWB) parameter 


constraints are consistent between different software packages, 
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Fig. 1: Spectral properties of a CRS assuming HD correlations. The left panel shows the free spectrum, the independent measurement 
of common power at each frequency bin, for the two versions of the EPTA-only data set. The right panel shows the 1/2/30 contour 
of the 2D posterior distribution of amplitude and spectral index when modelling the spectrum with a power law. In both panels, 
results for DR2 full are in blue, while those of DR2new are in orange. The solid lines in the left panel are the power law best-fits to 
the GWB (see main text for the parameters of the fit), while the vertical dashed line indicates the position of f = 1 yr^!. The vertical 


dashed line in the right panel denotes y = 13/3. 


Table 2: 90% credible regions for the power law parameters constraints in the different Bayesian analyses with DE440 for both 
DR2full and DR2new. The analyses included the search for common uncorrelated red noise (CURN), gravitational wave background 
(GWB), and a common correlated signal with overlap reduction function (ORF) modelled with different methods (binned ORF, 


Chebyshev ORE, and Legendre ORF). 


DR2full DR2new 

Software + Model logio Acrs YCRS logio Acrs YCRS 
ENTERPRISE + CURN -1453'02 4.13708) | -14.0032 2.91752 
FORTYTWO + CURN —14.52*0-4) 4327075 | -14.007327 2.917535 
ENTERPRISE + GWB -14.84*|/3 4.197073 | -13,.944023 2/308 
FORTYTWO + GWB -14.53'02 4167072 | —13.94'022. 2.71132 
ENTERPRISE + Binned ORF | -14.47*027. 4.1040% | —13.89*022. 2.63*050 
FORTYTWO + Binned ORF -1449'02  411:07 | -13.874022 2,5807 
ENTERPRISE + Chebyshev ORF | -14.50'022. 4.177075 | -13.871937 2.57036 
ENTERPRISE + Legendre ORF | —14.517020  4.19*077 | —13.89*023 — 2,597038 


Table 3: Z-score (in number of o) produced by the tensiometer package, detailed in (2021) when comparing 


posteriors produced by various data sets and software packages. The second column compares the posteriors between the DR2new 
and DR2 full data set while employing ENTERPRISE and FORTYTWO (in brackets). On the contrary, the third column compares the 
posteriors given by the ENTERPRISE and FORTYTWO software packages running on the DR2new and DR2 full data sets (in brackets). 


Data set comparison Software package comparison 
DR2new vs DR2full ENTERPRISE vs FORTYTWO 
ENTERPRISE (FORTYTWO) DR2new (DR2£ull) 
CURN 1.06 (1.15) 0.0063 (0.0274) 
GWB 1.50 (1.49) 0.006 (0.0229) 
Binned ORF 1.69 (1.68) 0.002 (0.0325) 
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Fig. 2: Difference distributions between two posterior distributions originating from GWB processes. The left panel depicts the 
difference distribution between DR2new and DR2ful1 data sets while employing the ENTERPRISE package. In comparison, the right 
panel shows the tension contour between ENTERPRISE and FORTYTWO software packages when we employ the DR2new data set. 
The plots contain three contours: 1c, 20, and the A contours that correspond to the value of the computed tension. 


Table 4: Median optimal statistic amplitudes and S/N for a single component fit for the monopole, dipole, or HD correlation fixing 
the spectral index of the common signal to 13/3. The uncertainties indicate the 90% credible region. 


DR2full DR2full+ DR2new DR2new+ 
Arg | 4423310" | 3a x10 | 1.54$7 x10! | 4079 x107 
AŽp | —1.9446 x 10°! (-0851x107 | 0.8223x 107°! | 3.9722 x 107°! 
Ans | 2352x109" | 2:997 x10 | 10.043} x 107° 33,9559 x 107° 
S/NwuP LED 13 deae 08r 
S/Np» 0.4 53 0.2055 0.1759 0.6255 
S/Nup Beare 1.471? 501 4.1427 


namely ENTERPRISE and FORTYTWO, as well as different mod- 
els. However, as already indicated by the free-spectrum analy- 
sis, there is a systematic difference in the CRS recovery between 
DR2 full and DR2new. 


We quantify the differences between the power law poste- 
rior distributions that arise from using different software pack- 
ages and data sets by adapting the tensiometer package, out- 


lined in (2021) and summarised in he EPTA 
and InPTA Collaborations|(2023). This package essentially pro- 


vides the probability density function of the parameter differ- 
ences which can be integrated to obtain the mean probability 
for the presence of parameter shifts (see equation 4 in [Raveri &| 
[Doux|(2021)). The resulting probability for a parameter shift can 
be converted into an effective number of os using the standard 
normal distribution. In short, the package produces a score that 
can be interpreted as ‘within how many o” two distributions are 
consistent (see also for more details). The 
results of this analysis in Table |3| indicate that the differences 
are minimal when comparing posteriors between different analy- 
sis software packages (ENTERPRISE vs FORTYTWO) regardless of 
the data set (either DR2full or DR2new). However, when com- 
paring GWB posteriors between different data sets (DR2 full vs 
DR2new), there are tensions of ~ 1o for CURN, ~ 1.40 for HD, 


and ~ 1.60 for Binned ORF, regardless of the software package 
used. 

Figure |2| shows, in the left panel, the two-dimensional 
posterior difference distribution between the ENTERPRISE and 
FORTYTWO posteriors obtained for the DR2new data set, again 
showing consistency of the results provided by the two inde- 
pendent analysis packages. On the contrary, the corresponding 
distribution for the difference in the posteriors associated with 
DR2full and DR2new, shown in the right panel, highlights the 
significant difference between the two data sets, more detailed 
comparisons can be found at the following UR. 

The parameter constraints from the Bayesian pipelines can 
be compared with the results of OS estimates. We first fix the 
spectral index to y — 13/3 and compute the OS amplitude and 
S/N for a CRS with monopole, dipole, or HD correlation. A sum- 
mary of our findings is given in Table] for the three correlation 
patterns in the four different data sets. The best-fit amplitudes 
for the HD correlation from the OS can be compared with the 
Bayesian value found when slicing the posterior at y = 13/3, 
which is Ais = 6.0140 x 10799, We notice that this value sits 
halfway between the OS amplitude estimate for the two data sets, 
with Abs of n x 1079? for DR2 full and 1008); x 1079? for 


^ https: //github.com/subhajitphy/Posterior. comparisons 
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Table 5: Model selection for different inter-pulsar correlation models for a common red signal (CRS). We present Bayes factors 
(BF), for different CRS models against the CURN model. We assume the DE440 SSE fit and use the PSRN+CURN model as the 
reference model. The model component acronyms are: (i) PSRN = individual Pulsar noise, (ii) CURN = common uncorrelated 
red noise, (iii) GWB = gravitational wave background with quadrupolar (HD) angular correlation, (iv) CLK = common signal with 
monopolar spatial correlation, as expected from a clock error, (v) EPH = common signal with dipolar spatial correlation, as expected 


from SSE errors. 


DR2 full DR2full+ DR2new DR2new+ 

ID Model ENTERPRISE FORTYTWO | ENTERPRISE | ENTERPRISE FORTYTWO | ENTERPRISE 
1 PSRN + CURN - - - - - - 

2 PSRN + GWB 4 5 4 60 62 65 

3 PSRN + CLK < 0.01 < 0.01 < 0.01 0.2 1.2 0.3 

4 PSRN + EPH < 0.01 ~ 1074 < 0.01 0.2 0.2 1.3 

5 | PSRN + CURN + CLK 2 1 2.7 0.8 2 1.6 

6 | PSRN + CURN + EPH 1 0.1 1 1 1.6 

7 | PSRN + GWB + CURN 3 3 4 27 13 29 

8 PSRN + GWB + CLK 5 12 7 28 35 57 

9 PSRN + GWB + EPH 3 3 3.6 33 29 43 


DR2new. Both estimates overlap with the Bayesian value within 
their 9096 credible region. The median value for the OS S/N esti- 
mate for a HD-correlated process increases from 1.3 in DR2full 
to 3.5 for DR2new. The A? and S/N distributions of the correlated 
processes as estimated by the OS are shown in Figure B] which 
further highlight the HD correlated signal emerging in DR2new. 


Although we fixed y = 13/3 in the previous analysis, the OS 
can also be computed for a common red process with an arbi- 
trary spectral slope. Figure [4] shows how the OS amplitude and 
S/N of the DR2new data subset change as we vary the spectral 
index y of the CRS model. We increased y from 1.0 to 5.0 in 
steps of 0.5 and also included y = 13/3 to show the expected 
spectral index of a stationary ensemble of inspiralling SMBHBs. 
We evaluate the S/N for the monopole, dipole, and HD corre- 
lations for each y. The median of the HD S/N appears to peak 
around a y of 2.0, broadly consistent with the shallow posterior 
found in the Bayesian analysis (cf. the right panel of Figure p. 
The spread of the histograms, however, means that S/N values 
are self-consistent across the whole range of y. 


4.2. Spatial correlation constraints 


After checking for spectral properties, we reconstruct the spatial 
correlation of the common red signal. The results of the Bayesian 
search for the correlations with ten binned free parameters and a 
common red signal power law are shown in Figure [5] The bins 
are chosen so that each of them contains 30 pulsar pairs. The 
grey-shaded histogram represents the distribution of pulsar pairs 
as a function of separation. Since the pulsar distribution is con- 
centrated in the galactic plane, we have more pairs at small an- 
gular separations compared to an array of pulsars uniformly dis- 
tributed across the sky. However, broad coverage of all angles is 
still achieved with the 25 pulsars chosen using the ranking proce- 
dure of (2023). When comparing the DR2fu11 ORF 
constraints with those of DR2new, one can see that the latter ap- 
pears much more consistent with the expected HD correlation. 
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In particular, the bins around 60, 80 and 135 degrees (i.e. the 
fifth, sixth, and ninth bins) have more positive correlation coeffi- 
cients in DR2 full. These appear to be responsible for the signal 
in DR2full being consistent with a CURN and a monopole, as 
also implied by the OS amplitude and S/N for a monopole corre- 
lation reported in Table f] In contrast, DR2new is very consistent 
with a HD-correlated process. We also use Chebyshev and Leg- 
endre decompositions for the ORF in the Bayesian analysis and 
find ORF and power law constraints that are consistent with the 
binned free parameter analysis presented here; see Appendix [A] 


For comparison, the spatial correlations computed with the 
OS marginalised over the pulsar noise parameters are shown in 
Figure [6] where the correlation coefficients have been obtained 
by scaling to the median amplitude at fixed y = 13/3, as given 
in Table[4] For each noise realisation, only the median values of 
the pulsar pair correlation are used. While Bayesian analysis av- 
erages the correlation within each bin, the OS uses each pulsar 
pair independently and fits the best correlation across all pairs. 
For comparison and visual purposes, we choose the same bin- 
ning and avoid showing 300 individual pulsar pairs. Although 
the two methods give broadly comparable ORF constraints, sev- 
eral differences can be found. Firstly, the first bin with the pul- 
sar pairs with the closest separations deviates away from the HD 
and is consistent with no correlation. Second, the fourth and sev- 
enth bins drop significantly into negative correlations. These dips 
are most prominent in DR2full, while DR2new follows the HD 
curve more closely. Consistent with the Bayesian evaluation, the 
OS reconstruction also shows prominent positive correlations for 
the fifth, sixth, and ninth bins in the DR2full data set, making 
the overall curve inconsistent with HD. 


To quantify how likely the data set is actually showing ev- 
idence for a GWB with HD correlation, we compute Bayes 
factors comparing different spatial correlations: Hellings-Downs 
(HD) correlations that arise from a GWB, monopole correlations 
that could be produced by clock errors (CLK) and dipole cor- 
relations that could be due to SSE systematics (EPH). Firstly, 
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Fig. 3: Amplitude and S/N for a common red signal with y — 
13/3 for the optimal statistic. The top and bottom panels show 
the noise-marginalised distributions of the squared amplitude 
A us and S/N, respectively, for a common signal with differ- 
ent correlation patterns, with HD in blue, monopole in orange, 
and dipole in green. Solid lines are results from DR2new and the 


dashed lines are results from DR2 full. 


the presence of a common uncorrelated red noise (CURN) is 
strongly favoured against a model that includes only individ- 
ual pulsar noises (PSRN). Compared to 3.3 from the 6PSR 
data set, the log10 Bayes factors are ~ 5 for DR2full and 
~ 3 for DR2new. Our main comparison was thus against the 
PSRN+CURN model, with respect to which we referred all 
Bayes factors. These are summarised in Table Both the 
PSRN+CLK and PSRN+EPH models are heavily disfavoured 
against the PSRN+CURN model in both DR2 full and DR2new 
(row IDs 3 and 4 in Table [5). The evidence for these two cor- 
relations as additional processes to the CURN is inconclusive 
(row IDs 5 and 6). Conversely, while DR2 full shows very little 
evidence for a GWB compared to the CURN, DR2new has a sig- 
nificant Bayes factor in favour of HD (row ID 2). Since this is 
a significant result, we have recomputed the Bayes factor using 
several alternative samplers and methods, as described in Sec- 
tion obtaining Bayes factors of 66, 56, 62. In this data set, 
we also find that BFs for models including an additional CLK 


PPP EET 


0.0 0.5 1.0 1.5 01234567 

A? 1e-29 S/N 
Fig. 4: Optimal statistic amplitude (left) and S/N (right) for a 
range of spectral indices for a common red signal using the 
DR2new data set. Results for the HD model are shown in blue, 
the dipole model in green, and the monopole model in orange. 


or EPH or CURN are about a factor of two smaller compared to 
the model including a GWB alone (row IDs 7, 8 and 9). On the 
contrary, the analogue BFs for DR2full are inconclusive with 
an indication for an additional monopole process (row ID 8). 


These BFs can be compared to the S/N from the OS in Table 
[4and Figure B] The DR2new data set yields a median S/N 3.5 
for the HD correlation, while it is about 1.3 in DR2full. These 
S/N estimates act as semi-independent confirmation of the BFs 
from Table |5| Consistent with the slightly higher BF for an ad- 
ditional monopole in DR2ful1 the S/N is « 1.2. For DR2new the 
S/N for a monopole drops to be consistent with zero. Lastly, no 
significant signal for a dipole correlation is found in either of the 
two data sets. 
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Fig. 5: Binned overlap reduction function. Blue is for DR2full while orange is for DR2new. The left panel shows violins of the 
posterior of the correlation coefficients averaged at ten bins of angular separations with 30 pulsar pairs each. The black line is the 
HD curve based on theoretical expectation of a GWB signal. The grey histogram is the arbitrarily normalised distribution of the 
number of pulsar pairs at different angular separations. The right panel is the corresponding 2D posterior for the amplitude and 
spectral index of the common correlated signal, showing 1/2/3 o contours. 
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Fig. 6: Constraints on the overlap reduction function from the 
optimal statistic. Blue and orange points indicate the results for 
DR2 full and DR2new respectively. The correlation coefficients 
for each pair of pulsars are weighted and averaged following the 
description in and grouped in the same 
way as those in Figure|5|for comparison. The HD correlation is 
plotted as a black line for reference. 


4.3. Significance tests 


To quantitatively estimate the significance of the hypothesis that 
a GWB signal with HD correlation is present in the data, the null 
hypothesis distribution need to be constructed. Many repetitions 
of an experiment need to be performed in order to define a strict 
p-value. This is, unfortunately, not possible for PTAs. Thus, we 
can only attempt to find a good proxy to estimate the true statis- 
tical p-value for the null hypothesis. In the following, we refer 
to the estimated value from our proxy methods as p-values for 
simplicity. The respective distributions can be constructed in two 
different ways, by introducing random phase shifts in the Fourier 


basis of the common red noise process (Taylor et al.|2017) or 


by moving the positions of the pulsars in the sky via a random 


scramble (Cornish & Sampson|2016). The aim of both methods 
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is to effectively destroy the distinctive cross-pulsar correlations, 
unique to the GWB signal, while retaining the individual pulsar 
noise characteristics. One should emphasise that both methods 
should be robust against any mismodelled features in the data 
set, therefore they, in general, provide more conservative esti- 
mates of the significance in comparison to the possibly oversim- 
plified noise simulation bootstrapping. 


The distributions of BFs under the null hypothesis (PSRN + 
CURN) were constructed for DR2 full and DR2new using about 
200 and 2000 phase shifts, respectively and are displayed in the 
upper panel of Figure|7| The DR2full measured BF from Ta- 
ble [5] lies within the 20 range of the null hypothesis distribu- 
tion with a p-value of 0.04. The p-value for the BF derived with 
the DR2new data set reaches a statistically interesting value of 
0.0005, which corresponds to the 3c level of significance ('ev- 
idence’). The analysis was performed using both ENTERPRISE 
and FORTYTWO and shows consistent results between the two 
software packages. This significance test was repeated for the 
OS S/N values for the HD correlation and results are shown in 
the bottom panel of Figure |7| For DR2full a p-value of 0.07 
is found. None of the 10000 realisations produced a S/N that is 
comparable to what has been found in DR2new. Therefore, only 
an upper limit can be set for the p-value « 0.0001, which corre- 
sponds to a significance of > 3.50. 


Figure[B|shows the null distribution obtained with sky scram- 
bles in the OS analysis in the top panel. A matching threshold of 
0.2 for any two sky scrambles was imposed to produce about 
5000 samples. A large difference particularly in the high S/N 
tail of the density functions can be found between DR2 full and 
DR2new. The p-value for DR2 full of 0.08 is comparable to that 
obtained with the phase shifts. This could indicate that in the low 
S/N regime, both methods produce reliable null distributions. In 
the high S/N regime, however, with DR2new the sky scramble 
p-value of 0.004 is not consistent with the phase shift method. 


The bottom panel of Figure [B] compares p-values from sim- 
ulations, theoretical computation and the two methods. A null 
distribution was generated using a set of realistic simulations re- 
sembling the statistical properties of the real DR2new data set 
and with the injected CURN only. The noise parameters as well 
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Fig. 7: 1-cumulative density function (CDF) from phase shifts. 
The top panel is for the Bayes factors for PSRN--GWB versus 
PSRN+CURN using ENTERPRISE (EP) and FORTYTWO (42). It 
should be noted that due to the computational cost, we calcu- 
lated only a limited number of phase-shifted BFs. This could 
explain the differences in the 1- CDFs. The OS S/N for the HD 
correlation with ENTERPRISE is shown in the bottom panel. Blue 
lines are for DR2full while orange lines are for DR2new. Verti- 
cal lines are the measured Bayes factor for PSRN--GWB versus 
PSRN+CURN reported in Table [5]or the OS S/Nup reported in 
TableH]respectively. The estimated p-values for each method are 
given in the legends. 


as the amplitude and slope of the CURN for the null distribu- 
tion were taken as random samples from the posteriors of the 
CURN search with DR2new. Additionally, we compare these p- 
values with those obtained from the theoretical null distribution 
described by generalised y? (GX2) distributions and derived in 
[Hazboun et al.|(2023). The distribution is computed by fixing the 
noise parameters to the median values of the posteriors?| 

Both null distributions are compared to the proxy distribu- 
tions obtained via phase shifting and sky scrambling. For con- 
sistency, instead of using the real data set, a target data set was 
simulated using the same procedure as the simulations for the 
null distribution, but with GWB injected instead of CURN. The 
lowest p-value of 0.002 is obtained using phase shifts. The most 


> This difference in how the parameters for the pulsar noise are cho- 
sen between the simulated and theoretical null distributions can explain 
the more conservative p-value for the former. As the randomly drawn 
parameters for the simulations could allow for more large S/Ns to be 
found by chance. 
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Fig. 8: I1-cumulative density function (CDF) from sky- 


scrambled optimal statistic HD S/N distributions and a compar- 
ison between different methods. In the top panel, the blue his- 
tograms are for DR2full while the orange histograms are for 
DR2new. Vertical lines are the measured S/Ngp values reported 
in Table[4] The large discrepancy between the two data sets could 
be an indication that some remaining signal is still present after 
the scrambling, especially in the strong S/N regime. More checks 
need to be performed to assess the validity of this method. The 
bottom panel compares a simulated null distribution in cyan, the 
generalised y? (GX2) distribution from in 
purple and the two proxy methods of phase shifting 1n orange 
and sky scrambling in green. At the measured value from the 
DR2new, the two methods differ by a factor of a few. The esti- 
mated p-values for each method are given in the legends. 


conservative number is obtained when using all sky scrambles 
without introducing any threshold or weighting with the p-value 
of 0.008. At S/Nup = 3.5 the p-values of the simulated (0.006) 
and theoretical (0.003) null distributions lie between those from 
phase shifting and sky scrambling, see Figure[8] We have tested 
introducing thresholds on the match between the true and scram- 
bled pulsar positions and amongst the scrambles themselves and 
found that in general smaller thresholds lead to a decrease in the 
proxy null distribution. Similar results are obtained when adding 
weights to account for the different contributions from each pul- 
sar. From all the above we can conclude that the p-value for the 
S/N found with DR2new should be ~ 0.004, which was also ob- 
tained with sky scrambling at a threshold of 0.2 and no weights. 
The inconsistencies between the p-values obtained with the real 
and simulated target data sets can be due to the incompleteness 
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of our simulations (e.g. exponential dips for J1713+0747 are not 
included and a simpler noise model with a fixed number of fre- 
quency components was used). 

For an optimal sky scrambling orthogonality may need to 
be ensured between different realisations. Additionally, a real- 
istic PTA does not have equally good pulsars, which should be 
taken into account when assessing the match between different 
scrambles. This can limit the maximum number of possible sky 
scrambles or their effectiveness in breaking correlated processes 
(Di Marco etal. 023). On the other hand, Hazboun et al. (2023) 
have shown that sky scrambling can lead to null distributions that 
are consistent with the theoretical prediction. Further studies are 
required to determine whether any method can be a good con- 
servative proxy for PTA experiments to accurately estimate the 
p-value and significance of a detected signal. 


4.4. Consistency tests 


4.4.1. Comparing the power in the auto-correlation and 
cross-correlation terms. 


For a true GWB both the auto-correlation and cross-correlation- 
terms should constrain the same process. Since the cross- 
correlated power is proportional to the square of the expectation 
value of the ORF, that is I?, which is always « 1, itis expected 
that the power in the auto-correlation terms — equivalent to the 
CURN - is the dominant contributor to the signal. However, we 
stress that the cross-correlation terms contain the ‘smoking gun’ 
that can provide conclusive evidence for the presence of a GWB 
in the data. 

We apply the split likelihood technique 
Antoniadis et al.|2022) to both DR2 full and DR2new. Fig- 
ure |9| shows the resulting posterior contours. While the auto- 
correlation-terms-only-analyses recover the CURN with consis- 
tent amplitude and slope for both data sets, noticeable differ- 
ences can be seen for the power law parameters that are con- 
strained using the cross-correlation terms only and assuming the 
HD correlation. For the DR2full data set, the cross-correlation 
terms contain virtually no power and thus only an upper limit is 
found for an HD-correlated signal. On the contrary, consistent 
with the much larger evidence for a GWB, the cross-correlation 
terms in DR2new produce a peak in amplitude. Since a long tail 
still remains, one cannot rule out the possibility of a zero ampli- 
tude. 


4.4.2. Solar System Ephemeris systematics 


The effects of SSE systematics on the PTA GWB search have 
been studied in{Tiburzi et al.](2016); [Guo et al.|(2019) and mod- 
els that can help mitigate the dipolar correlated signal induced by 
SSE systematics have been added to the tools for PTA analysis. 


We employ the widely used BAYESEPHEM model 
2018, |Vallisneri et al.|2020) on the EPTA-only data sets. 


Figure|10O|compares the addition of BAYESEPHEM to each data 
set against the use of the DE440 fit alone. Allowing SSE system- 
atics to be present and absorbed by a model is a more conserva- 
tive approach. In general, the left panels show that certain fre- 
quency bins have lower power compared to the DE440 analysis. 
This in turn broadens the power law posteriors. Comparing the 
DR2full data set against the DR2new data set, the longer time 
span of DR2full helps to produce results that are more indepen- 
dent of the SSE fitting used, while the short time span of DR2new 
strongly limits its ability to separate SSE systematics from other 
common signals. In fact 10.3 years is close to the Jovian orbital 
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Fig. 9: Power law posterior constraints from a split analy- 
sis using the auto-correlation terms (CURN) and the cross- 
correlation terms (assuming HD correlation) separately. The left 
two columns show the recovered power law from the auto- 
correlation terms and the right two columns show the cross- 
correlation terms. Blue distributions are for DR2full while or- 
ange distributions are for DR2new. 


Table 6: Bayes factors for different CRS models against the 
CURN model adding BAYESEPHEM to all models with the 
ENTERPRISE pipeline. The model component acronyms are as 
in Table[5] 


ID Model DR2full | DR2new 
] | PSRN - CURN - - 

2 | PSRN +GWB 1.5 17 

3 PSRN + CLK 0.5 2 

4 PSRN + EPH < 0.01 0.4 


period of about 12 years, which could lead to a strong degener- 
acy. 

Adding BAYESEPHEM also affects the BF for the different sig- 
nal models. Table[6]shows a selection of the same models as Ta- 
ble[5] The most relevant effect in the search for a GWB is that the 
BF in favour of the HD correlation in DR2new is reduced from 
about 60 to 17, a factor of about 3. Since BAYESEPHEM is known 
to partially absorb power from the GWB, this reduction follows 
the expectation, although the exact amount is difficult to predict 


(Vallisneri et al.|2020). 


4.4.3. Distinguishing a common red spectrum from similar 
individual pulsar noise spectra 


It was shown in (2021b) that the CURN hypoth- 


esis may become strongly favoured over the PSRN hypothesis 
even when spectra of temporal correlations across pulsars are 
similar and yet not strictly common, as implied by the CURN 
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Fig. 10: Spectral properties of a GWB signal in the style of Figure [1] for the DR2full (top) and DR2new (bottom) comparing the 


analysis without (blue) and with (orange) BAYESEPHEM. 


model. This is because the standard uniform prior distributions 
for power law parameters of pulsar-intrinsic red noise do not 
match the observed distributions of these parameters. The issue 
was addressed in {Goncharov et al.] (2022b), where the authors 
introduced a hierarchical model that governs the distribution of 
power law amplitudes across pulsars in the CURN. As the dis- 
tribution width is consistent with zero, this indicates the likely 
presence of a common-spectrum stochastic process in the data. 
Dropout analysis, (e.g. also enables 
the mitigation of this issue by identifying pulsar outliers. The 
effect of the range of simulated pulsar noise parameters on the 
spurious evidence for CURN is demonstrated in Figure 6 in [Zic] 
fet al](2022), 

To test each pulsar’s consistency with the CURN, we mea- 
sured both the dropout factors and the distribution of power law 
amplitudes of the CURN spectrum in the pulsars. The results are 
shown in Figure[11] The two top panels show the measurements 
Of Mog, A and Hoga» the standard deviation and the mean of 
the power law amplitude of CURN measured in the pulsars, fol- 
M SUA SU | n) As expected, the mean log, A 
is consistent with the measurement of log; Acurn. At the same 
time the standard deviation is consistent with zero, confirming 
the presence of common temporal correlations in the data. Based 
on the measured dropout factors, we find that in both data sets 
only a few pulsars have intrinsic red noise that seems to be incon- 
sistent with the CURN. The majority of pulsars display indiffer- 


ent behaviour, leaving around five pulsars to contribute most to 
the constraints of the CURN power law. However, the two data 
sets have CURNs with very different posterior contours. This can 
have an impact on the difference between certain pulsars agree- 
ing more or less with the CURN. J1909—0747 and J1744—0744, 
for example, have very steep intrinsic red noise, thus they are 
more consistent with the steep CURN from DR2 full compared 
to the shallow CURN from DR2new. More discussion on the in- 


trinsic pulsar noise properties can be found in |the EPTA and 
InPTA Collaborations) (2023). 


4.5. Continuous GW signal search 


In addition to the GWB search, we have also searched our data 
for the presence of a continuous GW (CGW) signal from an in- 
dividually resolvable SMBHB in a circular orbit. This subsec- 
tion presents a preliminary analysis and a detailed investigation 
(which includes simulations and significance estimation) will be 
given in a separate paper. The main aim of this section is twofold: 
(i) to look for an alternative explanation of the observed common 
signal and (ii) to understand how the inclusion of the CGW sig- 
nal in the model affects the main findings of our analysis. 


First, we performed an analysis using the DR2ful1 data set. 
The addition of the CGW signal to the custom pulsar noise and 
CURN is not informative about the presence of a CGW, with a 
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Fig. 11: Tests of the CURN model. The top two panels show that pulsar spectra of temporal correlations attributed to CURN 
are indeed consistent with representing the same spectrum (Goncharov et al.|2022b). The blue lines show the measurement for 
DR2full, whereas the orange lines show the measurement for DR2new. The top left panel is the measurement of the standard 
deviation of the CURN amplitudes across pulsar spectra, Clog 4, marginalised over the mean of the CURN amplitudes, Log, a, 
as well as pulsar-intrinsic noise parameters. The dashed vertical line denotes an upper limit at 95% credibility. The top right panel 
shows both the mean and the standard deviation. The inferred mean value is consistent with the measurement of log;, Acurn denoted 
by a dashed vertical line. The bottom panel shows the logarithm of the dropout factors which suggest 
pulsars’ contribution to the CURN model. Positive values represent support for the CURN model, whereas negative values point 
towards inconsistency of pulsar data with the CURN model. The differences in the dropout factors between the data sets could be 
due to differences in the recovered CURN or the pulsar red noise spectral properties. 


Bayes factor BFEGW+CURN = 0.5 (model containing CGW and 


sampler and method described in (2022). The candi- 
CURN over the model with CURN only). 


date source is localised in sky position and frequency. We have 
Next, we move on to the analysis of the DR2new data set. We also computed the F, statistic (a frequentist approach; see[Babak| 
started by considering a simple model of a CGW source super- Ellis et al.[2012). The results are shown in Figure 
imposed on PSRN. We found substantial support for the pres- The Fe statistic can be marginalised over the individual pul- 
ence of a CGW: the BF of the model CGW+PSRN over the null sar noise parameters in a similar manner as the OS. It is shown in 
model, PSRN, is 200 with the Earth term only and 260 if wein- blue with the mean value corresponding to S/N~ 4.5. We com- 
clude the pulsar term in the search. For this search we used the Puted the corresponding p-value (~ 0.1% ) by scrambling the 
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Fig. 12: F,-statistic for a CGW search marginalised over the 
noise uncertainties (blue). The F, obtained from the analysis 
of the data with scrambled sky position of pulsars is shown in 
orange and it is compared with theoretical Q2 distribution in 
black. 


pulsar sky positions, assuming 0.2 match as the orthogonality 
criterion. The F,-statistic takes only the Earth-term into account 
and scrambling the pulsar's position destroys the coherence of 
the CGW, while preserving the noise properties. The F, of sky 
scrambles is shown in orange and is compared to the theoreti- 
cally expected (for Gaussian noise) central y? distribution with 
four degrees of freedom. 

Including the CURN in the model does not change these 
results significantly: we can still identify the CGW candidate 
with BFSGW+CURN = 7 — 20, with the exact value depending on 
whether one includes the pulsar term (BF = 7) or only the Earth 
term, as well as on the sampling method used (BF = 12,20). 
Interestingly, the CURN parameters are much less constrained 
in the CGW+CURN model. We show partial results of the 
Bayesian parameter inference in Figure[13] 

Finally, we have considered a model containing a GWB plus 
a CGW and found that the GWB ‘absorbs’ most of the CGW, 
that is, the CGW becomes poorly constrained. At the same time, 
our results indicate that we cannot exclude the presence of the 
CGW, since the BF is not informative (BF e = 1.2), while the 
CGW model has a larger parameter space. We note that the iden- 
tified CGW candidate frequency of ~ 5 nHz is between the two 
lowest frequency bins that dominate the HD signal, as shown in 
Fi gure[1] The rest of the bins do not contribute much to the GWB 
signal, due to the relatively short observation span of the DR2new 
data set and the high level of white noise. To summarise, we find 
that the observed data is equally well explained by either a GWB 
or a single CGW. However, given the additional number of pa- 
rameters for the CGW model and in the absence of additional 
data we favour the simpler GWB model. A detailed analysis in- 
cluding extensive simulations will be presented in a forthcoming 
publication. 


5. Search results on the EPTA+InPTA data sets 
DR2full+ and DR2new+ 


Analyses of DR2full+ and DR2new+ data sets including the 
InPTA data indicate general consistency of the results with the 
DR2full and DR2new data sets, respectively. We provide a com- 
parison of posteriors for the CURN and GWB between the data 
sets, with and without InPTA data, in Tables [7] and [8| as well 
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Fig. 13: Inference of the frequency and amplitude of a puta- 
tive CGW in the CGW+CURN model. Plotted are the 50% and 
90% credible regions. Dark-cyan contours are obtained using the 
Earth term only and ENTERPRISE with PTMCMCSAMPLER, purple 
contours with Eryn sampler, while coral contours are produced 


using a sampler described in (2022) and also include 


the pulsar term. 


as in Figure[14] see also Appendix |B] While the DR2full+ and 
DR2full produce very consistent posteriors, a ~ 0.2c difference 
can be found between DR2new- and DR2new. The impact of the 
InPTA data is more noticeable in the shorter data set. 

The results for the OS shown in Table [4]are consistent with 
the EPTA-only data sets. Both DR2full- and DR2 full give sim- 
ilar amplitudes and S/Ns for the three correlation models. An 
increase in the S/Ns of about 0.5 for monopole, dipole and HD 
correlations can be found with the DR2new- against the S/Ns in 
the DR2new. 

The corresponding BFs are in general agreement with the 
EPTA-only results (cf. Table [5). The BF between GWB and 
CURN in the DR2new+ of 65 is comparable to the 60 from 
DR2new. However, when testing for additional processes, we find 
significantly larger BFs for PSRN+GWB-+CLK (row ID 8) and 
PSRN+GWB-+EPH (row ID 9): 57 versus 28 and 43 versus 33, 
DR2new+ versus DR2new, respectively. The small BF difference 
between the models including the CLK or EPH error terms and 
the GWB-only model further supports the assumption that the 
signal could be a GWB. 

Finally, we investigate the effect of the additional InPTA 
on the contribution of the individual pulsars to the CURN 
via the dropout factor in Figure As with the previous re- 
sults, most dropout factors are consistent between the DR2ful1+ 
and DR2full data sets. J1600—3053 shows an increase in the 
dropout factor, possibly due to better single pulsar noise mod- 
elling with the addition of the InPTA data. For the new genera- 
tion, EPTA-only and EPTA+InPTA data sets the differences are 
more pronounced. Most pulsars have dropout factors that are in 
agreement. Two pulsars, J1713+0747 and J0613—0200, have re- 
duced dropout factors, whereas J1909—3744 shows an increase. 

In summary, the results from the EPTA and InPTA com- 
bination are in broad agreement with the EPTA-only data set. 
The consistency between DR2full- and DR2full shows the ro- 
bustness of the results from the full EPTA+InPTA combination. 
When comparing DR2new- and DR2new the effects of the InPTA 
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Fig. 14: Difference distributions of posteriors while including the InPTA data. The left panel is associated with posteriors from 
DR2full- and DR2full. In contrast, the right panel involves the comparison between DR2new- and DR2new. Both panels show the 
tension for the GWB model. These plots provide three contours: 10, 20, and the A that represent the computed tension value. 


Table 7: 90% credible regions for the constraints of power law parameters in the different Bayesian analyses with DE440 for both 
DR2full- and DR2new+, which include the addition of InPTA data. 


DR2full+ DR2new+ 
Software + Model log;o Acrs YCRS log jy Acrs YCRS 
0. 0.40 0.33 1.12 
ENTERPRISE + CURN | -1444*05). 3.981045 | —14.30+033 3.530; 
0.18 0.39 0.2. 1.02 
ENTERPRISE + GWB | —14.48+918 4.061033 | —14.10702; 3.031192 


Table 8: Tensiometer package based discrepancies between 
GWB posteriors that arise from DR2full+, DR2 full, DR2new+, 
and DR2new. The entries provide the Z-score (in number of c). 


DR2full+ vs DR2full | DR2new+ vs DR2new 
CURN 0.037 0.229 
GWB 0.032 0.121 


data are more visible with differences in the power law posteri- 
ors, increased BFs and higher OS S/Ns for the GWB, but also 
for other possible noises. Further investigation is needed to as- 
sess and improve the combined sensitivity for GW searches. 


6. Discussions and conclusions 


The EPTA with its six telescopes and multiple observing systems 
has collected PTA observations for almost 25 years and using a 
new generation of observing systems for the last decade. For the 
DR2 we have increased the number of pulsars combined with the 
most recent observations from 6 to 25. A selection scheme has 
been applied to find the 25 pulsars that are sufficient to contribute 
about 9596 of the expected sensitivity of the full array with 42 
pulsars from DR1 (Speri et al.[2023). Here, we used the optimal 
timing and noise models obtained in 
(023); (2023) to search 


for a stochastic GWB. In addition, we combined data for ten 


common pulsars between InPTA DR1 (Tarafdar et al.)2022) and 
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EPTA DR2 and conducted GWB searches also on those extended 
EPTA--InPTA data sets. 

We present the main results of the GWB search using 
two versions of the EPTA 25-pulsar data set, the full data set 
(DR2full) with a time span of 24.7 yr and a new backends-only 
data set (DR2new) with a time span of 10.3 yr. Both data sets 
measure a common red signal. By virtue of its length, the full 
data set yields a better constraint on the spectral properties of 
the GWB. However, the new backends-only data set provides a 
better measurement of the cross-correlated power. In the follow- 
ing, we give a summary of the results of our analysis and discuss 
possible reasons for the discrepancies between the two data sets. 

The power law amplitude for an HD-correlated process us- 
ing DR2full is log;gA = —14.54*078, with a spectral index 
y= 4.19797. that is close to the expected value of 13/3 for 
a GWB from circular SMBHBs driven by GW emission. With 


DR2new we obtain a flatter spectrum with log,) A = —13.94*022 


-0.48 
and y = 2.715175. Fixing the spectral index to 13/3, the am- 


plitudes for the two data sets are consistent, with values around 
logio A = —14.617011. This indicates that the two data sets con- 
strain the same amount of power in the GWB, although the de- 
tailed spectral shape appears to be different. 

The free spectrum analysis provides the possibility to di- 
rectly compare the results from different data sets in the fre- 
quency domain. Ignoring the lowest 1/24.7 yr^! frequency bin of 
DR2full, the remaining bins show good consistency with those 
of DR2new. Including the lowest frequency bin in the power law 
fiting may lead to a steepening of the power law. With a shorter 
data span DR2new probes a different frequency range starting at 
about two times higher than 1/24.7 yr !. Power law fitting could 
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also be affected by the frequency bins just above 1075 Hz, result- 
ing in a flatter spectrum for DR2new. This could either indicate 
that a power law model does not provide a good fit to the com- 
mon signal, or there is additional noise or signal around 1075 Hz 
or 107? Hz. 

The differences between the Bayesian correlation curves ob- 
served in the two data sets are most obvious around angular sepa- 
rations of 60 deg, 80 deg and 135 deg. The correlation curve pro- 
duced by DR2 full shows a prominent monopolar-like structure, 
with a central part shifted upward compared to the HD curve, 
while the correlation curve produced by DR2new follows the 
HD correlation much more closely. These results are in agree- 
ment with the measured BFs (PSRN+GWB vs PSRN+CURN), 
which are four and 60 for DR2full and DR2neu, respectively. 
From the null hypothesis distributions of BFs (PSRN+GWB vs 
PSRN+CURN) constructed with phase shifts, we can infer a p- 
value of 0.04 for DR2 full and 0.001 for DR2new. There is es- 
sentially no evidence for other correlation patterns or additional 
common processes in either data set, with perhaps the exception 
of a tentative hint of an extra monopole in the DR2 full data set, 
which will be the subject of further studies. 

Optimal statistic analysis shows even more significant dis- 
crepancies between the two data sets. For DR2full the squared 
amplitude is only 2.7 x 1079, which is lower than the ampli- 
tude from the Bayesian analysis (A? = 6.02 x 107%). There is a 
large scatter in the correlation coefficients, giving a similar S/N 
of around 1.2 for both the HD and the monopole correlations. 
The p-value for the S/N of the HD correlation is 0.07 from phase 
shifts and 0.08 from sky scrambles. For DR2new the squared am- 
plitude is 1.0 x 10779, which is more consistent with (yet higher 
than) the Bayesian result. The correlation coefficients also match 
the HD curve much better. The S/N for the HD correlation in- 
creases to 3.5, while the S/N for the monopole correlation drops 
to almost zero. Sky scrambles give a p-value of 0.002 for HD 
correlation, while phase shifts yield a p-value « 0.0001. This 
corresponds to > 3c' significance of the GWB signal. 

Preliminary analysis including a CGW suggests that its con- 
tribution to the observed HD-correlated power cannot be ruled 
out. The presence of a CGW is not supported in the DR2full 
data set; its presence is preferred over CURN only in DR2new. 
The source amplitude and frequency are well-constrained. The 
candidate is also localised in the sky, but its position and error 
region depends on whether we include the pulsar term. However, 
adding a GWB to the analysis absorbs most of the power of the 
CGW, preventing any strong claim about its actual presence in 
the data. A more thorough analysis involving the CGW model 
will be presented in a separate paper. 

The analysis of the combined EPTA DR2 and InPTA DRI 
shows broadly consistent results with the EPTA DR2 alone. The 
power law parameter constraints with DR2full- show little dif- 
ference to those without the InPTA data. For DR2new-, the effect 
of the additional InPTA data is more pronounced. The power law 
parameters experience a small shift of 0.170 towards a steeper 
spectral index. The BFs and OS S/Ns are also in general agree- 
ment with the EPTA-only data sets. Increases in the evidence for 
additional monopole and dipole correlated signals of about 0.5 
can be found in DR2new+. A larger impact on the shorter data set 
can be expected, since the InPTA, with three years of time span, 
is a more significant fraction of DR2new (10.3 years) compared 
to DR2full (24.7 years). 

With the high amplitude and large uncertainty in the spectral 
index, the observed HD correlated signal is broadly consistent 
with the expectation from a cosmic population of SMBHBs. In 


particular, as shown by |Middleton et al.) (2021) the high ampli- 


tude of logA = —14.61 inferred when fixing y = 13/3 is consis- 
tent with the recent discovery of over-massive black holes (e.g. 


McConnell et al.|2011) and the upward revision of the normali- 
sation of the SMBH-host galaxy relations (see, e.g. | McConnell 
[& Ma|2013| [Kormendy & Ho|2013). It is not straightforward, 


however, to construct a self-consistent SMBH and host galaxy 
cosmic evolution model that results in such a high GWB sig- 
nal fulfilling other observational constraints on the SMBH mass 
function and on the evolution of the bolometric quasar lumi- 
nosity function with redshift (Izquierdo- Villalba et al.]2022). A 
spectrum significantly flatter than y — 13/3 can arise for a num- 
ber of different reasons, including strong coupling with the en- 
vironment, the predominance of highly eccentric SMBHBs (see 


e.g. [Sesana|2013) or simply by the presence of extra power at 


high frequencies due to sparse and loud marginally resolvable 


individual binaries (Middleton et al.|2021). Besides a cosmic 
population of SMBHBs, the detected signal can be generated by 


processes occurring in the early Universe (Caprini & Figueroa 
2018) as well as specific models of dark E EIE ASI 
Ed We plan to investigate the implications of this signal for 
all these formation scenarios in follow-up papers. 

Our results seem to indicate that DR2new provides better con- 
straints on the cross-correlated power than DR2full. It would 
normally be expected that the addition of more data would lead 
to a more significant detection of a stationary process. There are 
a few possible factors that could be contributing to this discrep- 
ancy that needs to be investigated in more detail. 


1. Lower quality of the early data, which lacks multi-radio fre- 
quency coverage and polarisation calibration, may have al- 
lowed for residual unmodelled noise. This can lead to dif- 
ferent noises and signals being recovered with the early and 
new generation observations. Investigating better noise mod- 
elling can help to increase the sensitivity and reliability of the 
early data. 

2. Improper weights for the power law fitting in different fre- 
quency bins could introduce bias in the recovery of the spec- 
tral properties. In particular, the lowest frequency bins only 
have contributions from a few pulsars with the longest time 
spans, but their weights are the highest since the largest 
amount of power in a common red signal is at the lowest 
frequencies. Considering a weighting scheme for the differ- 
ent frequency components for the power law fitting could 
produce an unbiased result. 

3. The presence of excess power at low frequencies can lead 
to a steepening of the power law. While excess noise at high 
frequencies can make the spectrum appear shallower. In both 
cases, noise leaks into the GW signal giving erroneous power 
law constraints. 

4. Non-stationarity of the pulsar noise or of the putative GW 
signal can cause the measured spectral properties to be dif- 
ferent between the early and late part of a data set as the 
properties of the noise and signal could evolve over time. 


Some of these differences between pulsars are smoothed out 
in the DR2new data set, as all pulsars have roughly the same time 
span ~ 10 yr. This may help to measure the cross-correlations 
between pulsar pairs more robustly. 

An extensive simulation campaign is ongoing to help to bet- 
ter understand the features of our data sets and to build more 
confidence in the internal consistency of our findings. Verifica- 
tion of the analysis algorithms and their performance on a real- 
istic PTA data set is needed to set our expectations. These can 
then be compared against the real data set to test the effects that 
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data quality and the noise/signal properties have on the results of 
the final analysis. Several simulation projects tackling different 
questions will be published in separate works. 


Concurrent efforts from the NANOGrav collaboration on 
their 15-year data set (Arzoumanian et al., 2023) and the PPTA 
on their DR3 (Reardon et al., 2023) provide independent results 
on the search for a gravitational wave background. These will 
be compared in the IPTA framework to increase our confidence 
and prepare for the next IPTA data set. Additionally, the CPTA 
is preparing its first data release and analysis for a GWB signal 
(Xu et al., 2023). 


Moving forward, we plan to add more pulsars timed with 
the new backends to DR2new. The EPTA data set is unique in 
its combination of time span, cadence, number of pulsars and, 
when combined with the InPTA data set, in DM monitoring. In- 
deed, among the pulsars timed by EPTA, there are more than 
30 sources observed with new backends for a time span » 8 
years displaying RMS< 2 us, which can add significant value to 
the EPTA data set. A combination of the resulting data set with 
NANOGrav15, PPTADR3 and MeerKAT DRI under the aegis 
of the third data release (DR3) of the IPTA, will produce a data 
set of unprecedented sensitivity that will help to pin down the 
nature of the signal presented here and to constrain its proper- 
ties. 
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Appendix A: Chebyshev and Legendre 
decomposition for the ORF 


In this Appendix, we present the spatial correlation measured 
among pulsar pairs as reconstructed from: i) a third order Chebi- 
shev polynomial decomposition and ii) a fifth order Legendre 
polynomial decomposition. Results are shown in Figures 
and[A 2 respectively, which highlight the broad consistency with 
the average HD correlation expected for a GWB and with the 
binned Bayesian reconstruction shown in Figure[5|and discussed 
in Section 


Appendix B: Comparison of the four data sets 


In this Appendix, we present the constraints for the free spec- 
trum, power law and binned correlations in Figures and 
[B -2]for the four data sets used in this work: DR2 full, DR2new, 
DR2full- and DR2new+. 
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Fig. A.1: Overlap reduction function reconstructed using Chebyshev polynomial. The figure is in the style of Figure [5] with the 
difference in the left panel being that the dashed and dotted lines indicate the central 95 and 99.7 % credible regions of the recon- 
structed spatial correlations. 
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Fig. A.2: Overlap reduction function reconstructed using Legendre polynomial. The figure is in the style of Figure[5] with the differ- 
ence in the left panel being that the dashed and dotted lines indicate the central 95 and 99.7 96 credible regions of the reconstructed 
spatial correlations. 


log10 RMS (s) 


Frequency (Hz) Y 


—— DR2full HD —— pR2new HD —— DR2full+ HD —— DR2new+ HD 


Fig. B.1: Spectral properties of a CRS signal assuming HD correlations in the style of Figure|I]for all four data sets. 
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Fig. B.2: Binned overlap reduction function in the style of Figure[5|for all four data sets. 
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